-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13019][Docs] Replace example code in mllib-statistics.md using include_example #11108
Conversation
ok to test |
cc @yinxusen |
Test build #51145 has finished for PR 11108 at commit
|
Test build #51352 has finished for PR 11108 at commit
|
Test build #51602 has finished for PR 11108 at commit
|
Test build #51646 has finished for PR 11108 at commit
|
Test build #51800 has finished for PR 11108 at commit
|
|
||
object SummaryStatisticsExample { | ||
|
||
def main(args: Array[String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def main(args: Array[String]):Unit = {
# $example off$ | ||
|
||
if __name__ == "__main__": | ||
sc = SparkContext(appName="HypothesisTestingKolmogorovSmirnovTestExample") # SparkContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
@keypointt @yinxusen I made one pass and left some comments inline. One issue is whether we want to print empty lines. It doesn't seem necessary to me for example code appearing in the user guide. There are also some redundant comments, which we should remove. But overall, this look great! |
@mengxr If there is no empty line, we will get a bulk of results which are hard to tell and recognize. I think it's a good experience for users that they can use |
I agree. But we don't really need lines just to print empty lines. For example: println(goodnessOfFitTestResult)
println() could be simply println(s"$goodnessOfFitTestResult\n") It is good to keep the example code short. |
Sure, that makes sense. Sent from my iPhone
|
Test build #53460 has finished for PR 11108 at commit
|
Hi @mengxr how do you like this change? |
LGTM. Merged into master. Thanks! |
Thank you! |
Seems this one breaks scala 2.10 compilation (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/547/console). Can you fix it? |
Oh sure I'll try to fix it asap |
@yhuai I've just tested on my local machine, and there is no compile error and the example output results correctly. Also it seems all the parameters are right according to method documentation for sampleByKey() and sampleByKeyExact() http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions And the build before merging to master was also successful...I'm not quite sure why this master build failed... |
I reverted the change in master to unblock other builds. @keypointt Could you test it on Scala 2.10? It seems that Scala 2.10 doesn't resolve keyword arguments nicely: def f(a: String, b: String, c: String = "xxx") = {
println(a, b, c)
}
f(a = "abc", "def")
<console>:34: error: not enough arguments for method f: (a: String, b: String, c: String)Unit.
Unspecified value parameter b.
f(a = "abc", "def") We should either fix the seed or use keyword argument for |
Oh yes I just tried 2.10 and 2.11, where 2.11 build succeeded but 2.10 failed. |
@mengxr I've add keyword argument at my forked repo: keypointt@892fe60 should I open a separate PR? since new commit is not syncing to this PR |
… include_example https://issues.apache.org/jira/browse/SPARK-13019 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <[email protected]> Closes apache#11108 from keypointt/SPARK-13019.
Hi @mengxr , I'm using below command to compile this branch (after merge in latest master branch) by scala-2.10 and I'm always getting this error Line 157 in 48978ab
I'm assuming that master branch now is using scala-2.11 and when building 2.10 there could be some resolve issues? |
|
… mllib-statistics.md using include_example ## What changes were proposed in this pull request? This PR for ticket SPARK-13019 is based on previous PR(#11108). Since PR(#11108) is breaking scala-2.10 build, more work is needed to fix build errors. What I did new in this PR is adding keyword argument for 'fractions': ` val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)` ` val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)` I reopened ticket on JIRA but sorry I don't know how to reopen a GitHub pull request, so I just submitting a new pull request. ## How was this patch tested? Manual build testing on local machine, build based on scala-2.10. Author: Xin Ren <[email protected]> Closes #11901 from keypointt/SPARK-13019.
https://issues.apache.org/jira/browse/SPARK-13019
The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.
Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}
Jekyll will find
examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala
and pick code blocks marked "example" and replace code block in{% highlight %}
in the markdown.
See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337